CS229: The Netflix Project

نویسندگان

  • Jane Huang
  • Jack Kamm
  • Conal Sathi
چکیده

This paper investigates the combination and application of a number of machine learning applications to the Netflix Challenge. The algorithm uses extra data in addition to the Netflix training set. Namely, it uses a mapping from Netflix to features gleaned from IMDB, such as the director and genre. Using k-means clustering, the algorithm first clusters the users together by the IMDB features each movie has. We tried making predictions through principally three methods: using Naive Bayes alone on the IMDB features, the average rating that a cluster gives to a movie, and through combining Naive Bayes with clustering.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring collaborative filters: Neighborhood-based approach

In this project, we study the effectiveness of collaborative filtering mechanisms in the context of the Netflix competition. We focus our attention on a dataset provided by Netflix which includes a training set with more than 100 million 4-tuples: user id, movie id, rating, and date [3]. In the first part of this project, we develop a simple model to predict future ratings of users based on the...

متن کامل

Plead or Pitch? Predicting the Performance of Kickstarter Projects

In this CS229 project, using 26K project proposals from crowdfunding platform Kickstarter, we evaluate the performance of different models (logistic regression, SVMs among others) on predicting whether a project will meet its funding goal or not, with a particular focus on features derived from the language used in project pitches. We then contrast the performance of our best model when using d...

متن کامل

Cs229 Project: Tls, Using Learning to Speculate

We apply machine learning to thread level speculation, a future hardware framework for parallelizing sequential programs. By using machine learning to determine the parallel regions, the overall performance is nearly as good as the best heuristics for each application.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008